Adds 'redis' WORKER_TYPE by dkliban · Pull Request #7296 · pulp/pulpcore

dkliban · 2026-02-08T03:01:56Z

This adds WORKER_TYPE setting. The default value is 'pulpcore'. When 'redis' is selected, the tasking system uses Redis to lock resources. Redis workers produce less load on the PostgreSQL database.

closes: #7210

Generated By: Claude Code.

📜 Checklist

Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
A changelog entry or entries has been added for any significant changes
Follows the Pulp policy on AI Usage
(For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

This adds WORKER_TYPE setting. The default value is 'pulpcore'. When 'redis' is selected, the tasking system uses Redis to lock resources. Redis workers produce less load on the PostgreSQL database. closes: pulp#7210 Generated By: Claude Code.

Added redis connection checks to the worker so it shuts down if the connection is broken.

gerrod3

There's still a lot I haven't deeply reviewed yet, but this was getting long and I had a big idea around dispatch that I want to discuss

pulpcore/tasking/redis_locks.py

gerrod3 · 2026-02-13T22:57:31Z

pulpcore/tasking/redis_tasks.py

+        current_app = AppStatus.objects.current()
+        if current_app:
+            _logger.info(
+                "TASK EXECUTION: Task %s being executed by %s (app_type=%s)",
+                task.pk,
+                current_app.name,
+                current_app.app_type,
+            )
+        else:
+            _logger.info(
+                "TASK EXECUTION: Task %s being executed with no AppStatus.current()", task.pk
+            )


Is this needed? Can this be moved to log_task_start? The value should be set on the task object after set_running is called.

This is not needed. This was added when I was doing some debugging. I am going to remove this logging.

gerrod3 · 2026-02-13T23:00:02Z

pulpcore/tasking/redis_tasks.py

+    finally:
+        # Safety net: if we crashed before reaching the lock release above,
+        # still try to release locks here (e.g., if crash during task execution)
+        if safe_release_task_locks(task):


I'm pretty sure this is wrong. finally always runs regardless of exception or returning early.

The comment is not right, but the finally block is still needed in certain exception scenarios. I'll update the comment to reflect this.

gerrod3 · 2026-02-13T23:05:15Z

pulpcore/tasking/redis_tasks.py

+def execute_task(task):
+    """Redis-aware task execution that releases Redis locks for immediate tasks."""
+    # This extra stack is needed to isolate the current_task ContextVar
+    contextvars.copy_context().run(_execute_task, task)


Reading through this version and the base version there is nothing much different between the two besides that this one calls safe_release_task_locks. Could this be a wrapper of the original with a try/finally to release the redis locks?

gerrod3 · 2026-02-13T23:27:32Z

pulpcore/tasking/redis_tasks.py

+            current_app = AppStatus.objects.current()
+            lock_owner = current_app.name if current_app else f"immediate-{task.pk}"


Probably same question as before in _execute_task, but when is this ever None?

It should never be None. Once again this is remains of code left over from debugging initial issues.

Let's remove it then.

gerrod3 · 2026-02-13T23:29:01Z

pulpcore/tasking/redis_tasks.py

+            except Exception:
+                # Exception during execute_task()
+                # Atomically release all locks as safety net
+                safe_release_task_locks(task, lock_owner)


Not needed, execute_task should already handle letting go of the locks.

that is correct. i'll update the comment to more accurately state that the except block is for the case where using_workdir() fails before the execute function gets a chance to run and release locks itself.

Can you create a redis_using_workdir(task, app_lock) that handles this in a finally block?

gerrod3 · 2026-02-13T23:52:24Z

pulpcore/tasking/redis_tasks.py

+    task = Task.objects.create(**task_payload)
+    if execute_now:
+        # Try to atomically acquire task lock and resource locks
+        # are_resources_available() now acquires ALL locks atomically
+        if are_resources_available(task):


Crazy idea, not sure if you want to try it: but how about trying to acquire the redis locks before the task hits the DB?

In the default worker scenario, dispatch acquires the lock on the task on creation because app_lock is set to the current task dispatcher (usually the API worker) and the task worker's fetch_task can only select from tasks that have this field be null.

The new redis worker selects from any task that is waiting and thus there is this time window between the the task object hitting the DB and line 470's are_resources_available that you have to account for inside dispatch. Instead we can acquire the task lock first, then create the task, then try to acquire the task's needed resources locks, if successful execute, else defer and finally do a safe release of the task lock. This way dispatch shouldn't be fighting against task workers to get the task lock.

Suggested change

task = Task.objects.create(**task_payload)

if execute_now:

# Try to atomically acquire task lock and resource locks

# are_resources_available() now acquires ALL locks atomically

if are_resources_available(task):

# note that the pulp_id is set once the object is instantiated even if not saved to the DB yet!

task = Task(**task_payload)

# new function to just acquire the lock on the task

aquire_lock(task.pulp_id)

task.save()

if execute_now:

# Change this function to only get task's resources since we already hold the task lock

if are_resources_available(resources):

# now guarenteed to have task + resource locks

i like this idea. I would be interested in implementing it as a follow up PR.

Ok instead of pre-aquiring the task lock before it hits the DB we can just do what the normal worker does and have fetch_task only select tasks that have app_lock=None. Then we can remove all this task refreshing logic because there won't be any worker able to grab the lock before us.

gerrod3

Lots of comment/logging statements to remove. Need more specificity on the try/except blocks. And finally there are gaps in the task logic that need to be addressed.

gerrod3 · 2026-02-19T16:17:46Z

pulpcore/tasking/redis_locks.py

+    local resource_name = ARGV[2 + i]
+
+    -- Remove from set
+    local removed = redis.call("srem", key, lock_owner)


This call can fail if the item at key is no longer a set, i.e. is now a string for an exclusive lock.

gerrod3 · 2026-02-19T16:21:22Z

pulpcore/tasking/redis_locks.py

+    # Determine lock owner
+    if lock_owner is None:
+        current_app = AppStatus.objects.current()
+        lock_owner = current_app.name if current_app else f"immediate-{task.pk}"


Is this ever set to immediate-{task.pk}? Shouldn't AppStatus.objects.current() always return an object?

gerrod3 · 2026-02-19T16:23:55Z

pulpcore/tasking/redis_locks.py

+    # Determine lock owner
+    if lock_owner is None:
+        current_app = await sync_to_async(AppStatus.objects.current)()
+        lock_owner = current_app.name if current_app else f"immediate-{task.pk}"


gerrod3 · 2026-02-19T16:25:29Z

pulpcore/tasking/redis_locks.py

+        return ["error"]  # Return non-empty list to indicate failure
+
+
+def release_resource_locks(redis_conn, lock_owner, task_lock_key, resources, shared_resources=None):


Suggested change

def release_resource_locks(redis_conn, lock_owner, task_lock_key, resources, shared_resources=None):

def release_resource_locks(redis_conn, lock_owner, task_lock_key, resources=None, shared_resources=None):

gerrod3 · 2026-02-19T16:28:05Z

pulpcore/tasking/redis_locks.py

+        # Log debug for successful releases
+        num_released_exclusive = len(exclusive_resources) - len(not_owned_exclusive)
+        num_released_shared = len(shared_resources) - len(not_in_shared)
+        if num_released_exclusive > 0:
+            _logger.debug("Released %d exclusive lock(s)", num_released_exclusive)
+        if num_released_shared > 0:
+            _logger.debug("Released %d shared lock(s)", num_released_shared)
+        if not task_lock_not_owned:
+            _logger.debug("Released task lock %s", task_lock_key)


Do we still need these debug logs?

gerrod3 · 2026-02-19T20:18:56Z

pulpcore/tasking/redis_worker.py

+        than 5 seconds, then subtracts the number of active workers to get the
+        number of tasks waiting to be picked up by workers.
+        """
+        # Calculate the cutoff time (5 seconds ago)


Can you remove the inline comments from this method? The code should speak for itself.

gerrod3 · 2026-02-19T20:21:11Z

pulpcore/tasking/redis_worker.py

+    def _release_resource_locks(self, task_lock_key, resources, shared_resources=None):
+        """
+        Atomically release task lock and resource locks.
+
+        Uses a Lua script to ensure we only release locks that we own.
+
+        Args:
+            task_lock_key (str): Redis key for the task lock (e.g., "task:{task_id}")
+            resources (list): List of exclusive resource names to release locks for
+            shared_resources (list): Optional list of shared resource names
+        """
+        release_resource_locks(
+            self.redis_conn, self.name, task_lock_key, resources, shared_resources
+        )


Remove this method, it doesn't do anything meaningful.

gerrod3 · 2026-02-19T20:30:53Z

pulpcore/tasking/redis_worker.py

+        """
+        # Query waiting tasks, sorted by creation time, limited
+        waiting_tasks = (
+            Task.objects.filter(state=TASK_STATES.WAITING)


Suggested change

Task.objects.filter(state=TASK_STATES.WAITING)

Task.objects.filter(state=TASK_STATES.WAITING, app_lock=None)

Ok thinking about it some more, this is super important! We need to change the dispatch and set_running code that handles app_lock back to how the normal worker does it. So much of our task logic depends on this and changing it is a fools errand. cleanup_redis_locks_for_worker is completely inoperable without app_lock being correct!

gerrod3 · 2026-02-19T20:32:01Z

pulpcore/tasking/redis_worker.py

+        ):
+            self.ignored_task_ids.remove(pk)
+
+    def cleanup_redis_locks_for_worker(self, app_worker):


See my comment about fixing the app_lock logic. This method currently can't be reasoned about or expected to work correctly without app_lock behaving exactly like the normal pulpcore worker.

gerrod3 · 2026-02-19T20:56:57Z

pulpcore/tasking/redis_tasks.py

+    task = Task.objects.create(**task_payload)
+    if execute_now:
+        # Try to atomically acquire task lock and resource locks
+        # are_resources_available() now acquires ALL locks atomically
+        if are_resources_available(task):


Ok instead of pre-aquiring the task lock before it hits the DB we can just do what the normal worker does and have fetch_task only select tasks that have app_lock=None. Then we can remove all this task refreshing logic because there won't be any worker able to grab the lock before us.

dkliban force-pushed the 7210 branch 14 times, most recently from 19a5443 to ec316bf Compare February 12, 2026 01:15

github-actions bot added multi-commit no-changelog no-issue labels Feb 13, 2026

Adds 'redis' WORKER_TYPE

289cd2d

This adds WORKER_TYPE setting. The default value is 'pulpcore'. When 'redis' is selected, the tasking system uses Redis to lock resources. Redis workers produce less load on the PostgreSQL database. closes: pulp#7210 Generated By: Claude Code.

dkliban force-pushed the 7210 branch 2 times, most recently from da96f66 to dd46ad1 Compare February 13, 2026 16:11

Added WORKER_TYPE setting validation.

0bd61e0

Added redis connection checks to the worker so it shuts down if the connection is broken.

dkliban force-pushed the 7210 branch from dd46ad1 to 0bd61e0 Compare February 13, 2026 16:13

gerrod3 requested changes Feb 13, 2026

View reviewed changes

Removed some logging and updated comments.

0ebb8da

gerrod3 requested changes Feb 19, 2026

View reviewed changes

		current_app = AppStatus.objects.current()
		lock_owner = current_app.name if current_app else f"immediate-{task.pk}"

-    task = Task.objects.create(**task_payload)
-    if execute_now:
-        # Try to atomically acquire task lock and resource locks
-        # are_resources_available() now acquires ALL locks atomically
-        if are_resources_available(task):
+   # note that the pulp_id is set once the object is instantiated even if not saved to the DB yet!
+    task = Task(**task_payload)
+    # new function to just acquire the lock on the task
+    aquire_lock(task.pulp_id)
+    task.save()
+    if execute_now:
+        # Change this function to only get task's resources since we already hold the task lock
+        if are_resources_available(resources):
+            # now guarenteed to have task + resource locks

		return ["error"] # Return non-empty list to indicate failure


		def release_resource_locks(redis_conn, lock_owner, task_lock_key, resources, shared_resources=None):

	Task.objects.filter(state=TASK_STATES.WAITING)
	Task.objects.filter(state=TASK_STATES.WAITING, app_lock=None)

Conversation

dkliban commented Feb 8, 2026

📜 Checklist

Uh oh!

gerrod3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkliban Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkliban Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gerrod3 Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gerrod3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

dkliban Feb 17, 2026 •

edited

Loading

dkliban Feb 17, 2026 •

edited

Loading

gerrod3 Feb 13, 2026 •

edited

Loading